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Genome Assembly, Variant Calling and Phasing 

A fragment-level assembly of Capsella bursa pastohs was undertaken with 
the aim of generating KBase size-range sequences that would be informative for a 
gene-level realignment against Capsella rubella, rather than for a comprehensive 
genome assembly per se. The low sequence divergence between homeologous 
sequences poses challenges for assembly and argued for a relatively conservative 
approach, as a consequence of which the Ray assembler (1) was selected over more 
aggressive assemblers (eg Velvet, EBI). Approximately 20 million 2xl05nt illumina 
read pairs (4.3GBase) from two European accessions (12.4 and 16.9) were 3' 
trimmed to remove sequences with a quality <30 and contigs were then generated 
by the Ray assembler v 1.4 (1) using a Kmer length of 31. A marginal degree of 
further scaffolding was then undertaken with the Soap de Novo assembler (BGI, v 
1.05) using the same paired-end data and a Kmer length of 51. However this only 
slightly extended the Ray contigs. The final scaffold N50 was 2.5KBase (0.23% 
uncalled bases), contig N50 2.4KBase, with a maximum scaffold/contig length of 
~25KBase. The total genome size was 210.5MBase with 83% of the genome present 
in sequences longer than 0.5KBase. 

For polymorphism analyses, Illumina reads for both C. orientalis and C. bursa- 
pastoris were mapped to the Capsella rubella reference genome(2), using Stampy 
version 13 (3), and Picard (picard.sourceforge.net) was used for read sorting and 
file format conversion. Genotyping of SNPs and short indels was done using the 
Genome Analysis Toolkit (GATK) software package (4, 5), with re-alignment of 



sequences surrounding short indels (6). The resulting polymorphism data was 
combined with similarly processed whole genome polymorphism data from 13 C. 
grandiflora individuals (7, 8). 

To minimize the risk of spurious SNP calling, each site in the VCF files was 
rigorously filtered by PHRED quality score of 40 and depth. Capsella bursa-pastohs 
and C. grandiflora VCF file depth cut-offs were a minimum of 20 and maximum of 
100. Depth cut-offs for C. orientalis were a minimum of 15 and maximum of 100. We 
also excluded 20kb genomic windows where less than 30% of sites passed these 
quality and depth criteria for a given species. Many of these regions corresponded to 
pericentromeric regions of the genome. 

To identify transposable element insertions in C. bursa-pastoris we used the 
Popoolation TE approach developed by Kofler et al. (9). Using default settings, we 
ran the pipeline on paired-end Illumina (108 bp) samples from 8 of our C. bursa- 
pastoris individuals, using the C. rubella genome (2) as the reference genome. Since 
Popoolation TE was originally developed for pooled population samples designed to 
infer population wide frequencies, we modified it to apply to individual samples 
using frequency cutoffs to define heterozygous and homozygous insertions Q.A. 
Agren, W. Wang, D. Koenig, B. Neuffer, D. Weigel and S.I. Wright, in revision). 
Principal components analysis of insertion presence/absence in C. bursa-pastoris 
was analyzed along with insertions previously identified in our samples of C. 
orientalis and C. grandiflora (J.A. Agren, W. Wang, D. Koenig, B. Neuffer, D. Weigel 
and S.I. Wright, in revision) using SPSS v. 22 (SPSS Inc. 2009) under default settings. 



Phasing of the C. bursa-pastoris homeologs was conducted with HapCUT v0.6 
(10), which has been used to successfully phase tetraploid wheat (11). HapCUT was 
run on SNPs passing our stringent heterozygosity and depth filters, and the 
resulting phased blocks were thus composed of high-confidence polymorphisms. 
Each C. bursa-pastoris sample was phased individually and then phased blocks were 
identified as either unknown origin, descended from C. grandiflora or C. orientalis or 
discordant between the two based on SNPs shared with the progenitors. SNPs 
identified within these blocks were compared across all samples and any block 
containing a SNP inconsistently assigned to different homeologs was removed from 
downstream analyses. This procedure could lead us to globally underestimate 
polymorphism levels in C. bursa-pastoris, but should not bias our inference of 
selection or affect inference of biased fractionation, as no particular category of 
SNPs is preferentially removed, and both homeologs are expected to be equally 
affected by this procedure. In total 523,425 SNPs were used to infer the parental 
origin of the phased blocks with 16% of these resulting in discordant assignments 
across samples. The procedure was validated by comparison to Sanger sequence 
data for six independently amplified and sequenced genomic fragments from the 
two homeologs. Principal Components analysis on phased SNP was run using the 
resulting dataset, with analysis restricted to common SNPs at a frequency of 6 or 
more across the entire dataset. PCA of the phased SNPs was run using SPSS v. 22 
(SPSS Inc. 2009) under default settings 

Comparative Genomics Analyses 



A multiple whole-genome alignment between the C. rubella reference 
genome (2) and Illumina fragment assemblies of C. bursa-pastoris, C. grandiflora (2), 
C. orientalis (using Ray v. 1.4, J.A. Agren, W. Wang, D. Koenig, B. Neuffer, D. Weigel 
and S.I. Wright, in revision) and Neslia paniculata (2) was conducted essentially as 
described by Haudry and colleagues (7). Briefly, each fragment-assembly was 
initially aligned against the soft masked (RepeatMasker, www.repeatmasker.org ) C. 
rubella reference sequence using Lastz (12) with parameters -gapped -nochain ~ 
gfextend ~strand=both. Alignments were then chained using axtChain (Kent tools, 
UCSC) with a minimum chain Score of 10,000 and a slightly customized linear gap 
table. Chains were then selected for the subset of most likely orthologous chains 
having the maximum alignment score against the C. rubella reference, retaining only 
a single chain for each C. rubella sequence for all assemblies except C. bursa-pastoris 
in which up to two chains could be selected providing there was sufficient evidence 
for two good orthologous alignments. Finally the individual alignments against C. 
rubella were iteratively merged by phylogenetic distance using multiz (13) with 
default parameters to create a multiple alignment. 

We constructed maximum-likelihood phylogenies for alignments of each 
assembled fragment that included two distinct C. bursa-pastoris sequences as well as 
the orthologous C. grandiflora, C. rubella, C. grandiflora, and Neslia paniculata 
sequences. We constructed phylogenies using RAxML's (14) rapid bootstrap 
algorithm to find the best-scoring ML tree. Each phylogeny had 100 bootstrap 
replicates and used N. paniculata as the outgroup. We excluded trees with less than 
80% bootstrap support at any branch from further analysis. We then used a custom 



Perl script to count the number of resulting phylogenies corresponding to each 
topology. 

We used two approaches to validate our phylogenetic inference. First, we 
assessed whether similar patterns were observed with Sanger data and larger 
sample sizes for C. bursa-pastoris. Second, we assessed whether patterns of fixed 
heterozygosity and fixed differences between the diploid putative ancestors were in 
agreement with expectations under our phylogenetic hypothesis. 

For the first validation, we assessed phylogenetic patterns at nine 
independent nuclear genes, where both homeologs have previously been amplified 
and sequenced in C. bursa-pastoris with homeolog-specific primers (15-17). We 
amplified and sequenced the same loci in C. orientalis using these previous primers 
and, and used Muscle (18) to align our sequences to publicly available data for the 
same loci from C. grandiflora (sequences phased using PHASE2.1; (19, 20), C. rubella 
and C. bursa-pastoris from (16, 17, 21-25) (Table S4). As outgroups, we used 
Arabidopsis thaliana and/or N. paniculata. All positions with gaps or missing data 
were removed and data for each locus was collapsed into unique haplotypes using 
FaBox 1.40 ( http://users-birc.au.dk/biopv/php/fabox/ ). Subsequently neighbor- 
joining trees were reconstructed in MEGA5 (26), with distances estimated based on 
the maximum composite likelihood method (27) and support evaluated using 1000 
bootstrap replicates. 



Population Genetic Analyses 
Demographic inference 



To infer demographic parameters associated with allopolyploid speciation in 
C. bursa-pastoris, we analyzed site frequency spectra for 60,225 SNPs at intergenic 
non-conserved regions and four-fold synonymous sites. Specifically, we used 
fastsimcoal2.1 (28) to infer demographic parameters based on the multidimensional 
site frequency spectrum for C. grandiflora, C. orientalis, and the two C. bursa-pastoris 
homeologous genomes. Estimates were obtained using the composite maximum 
likelihood approach, under four models that differed in the type of population size 
change (stepwise or exponential) allowed, and in the presence or absence of post- 
polyploidization asymmetric migration (Figure 3). All parameter estimates were 
global maximum likelihood estimates from 50 independent fastsimcoal2.1 runs, 
with a minimum of 50,000 and a maximum of 250,000 coalescent simulations, and 
10-40 cycles of the likelihood maximization algorithm. Multidimensional SFS entries 
with less than 5 SNPs were pooled to avoid negative effects on the estimation 
procedure, as suggested by Excoffier and colleagues (28). We assumed a mutation 
rate of 7*10~ 9 (29) and a generation time of 1 year when converting estimates to 
units of years and individuals. Confidence intervals of parameter estimates were 
obtained by parametric bootstrapping, with 100 bootstrap replicates per model. 
Model fit was assessed using the Akaike information criterion and Akaike's weight 
of evidence, as in (28). Note that, because of possible linkage disequilibrium among 
our SNPs, particularly in selfing populations, confidence intervals and the strength 
of AIC model support should be treated with some caution. However the parameter 
estimates themselves under the composite likelihood approach are expected to be 
robust, and we obtained comparable timing estimates across models (Figure 4) and 



when using different subsets of sites with likely differences in their LD structure 
(nonconserved noncoding sequence and 4-fold degenerate sites), arguing that our 
main conclusions are not affected by LD between SNPs. Furthermore, to further 
investigate the possible influence of linkage disequilibrium on the demographic 
inference we re-ran the models excluding sites less than lOkb apart, and the major 
conclusions, including the best-fitting model, were found to be unaffected 

Estimating the distribution of fitness effects 

Estimation of the distribution of deleterious fitness effects (DFE) was 
calculated using the maximum likelihood approach designed by Keightley and Eyre- 
Walker (30), which uses the allele frequency spectrum of polymorphic sites to infer 
the strength of purifying selection (see estimates of polymorphism above). 

The subset of sites considered were those that passed filtering criteria within 
all three Capsella species. Frequency data was taken from C. bursa-pastoris, C. 
grandiflora, and C. orientalis for several site classes: 0-fold non-synonymous, 4-fold 
synonymous, and conserved noncoding sites (CNCs) that were identified in a 
comparison of nine Brassicaceae species using the Capsella reference genome as the 
reference for alignment (8). Sites that were heterozygous in all individuals of C. 
bursa-pastohs (hereafter referred to as 'fixed heterozygotes') were not included in 
these analyses, as they are not segregating within either homeolog. To ensure an 
accurate comparison between species, the C. grandiflora dataset was downsampled 
to 10 individuals. Heterozygous sites in C. grandiflora were converted to 
homozygotes by randomly selecting one of the two bases, in order to mitigate any 



bias caused by higher heterozygosity in this species. To determine 95% confidence 
intervals, 200 bootstrap replicates of each site class were used to recalculate the 
folded allele frequencies and numbers of divergent sites. Significance was 
determined as described in Keightley & Eyre- Walker (30). To account for 
uncertainty due to linkage disequilibrium when estimating the DFE, significance 
tests and bootstrap confidence intervals were generated by resampling lOkb blocks 
ofSNPs. 

Inference of Selective Effects and Functional Enrichment 
Inference of gene loss 

We took two distinct approaches to identify putative deletion events. First, 
we used HTseq (31) to count the number of reads mapping to each annotated gene 
in the Capsella reference genome, using the 'intersection_nonempty' option. After 
normalizing each sample by the total number of reads, we identified genes that 
showed significant reductions in coverage in the C. bursa pastoris samples in paired 
tests of both C. grandiflora and C. orientalis. Significance was assessed using 2-tailed 
t tests assuming unequal variance. Only tests with p values less than 0.01 against 
both species (corresponding to a false discovery rate FDR of approximately 5%) 
were treated as significant. Furthermore, to restrict our analyses to putative single- 
copy deletion events rather than variance in read mapping success and/or high copy 
number genes, we only considered cases where the fold change in coverage in C. 
bursa-pastoris ranged from 0.25 to 0.65. 



As a second approach, we used Pindel (32) to identify large deletions 
spanning whole genes and small insertion/deletion events affecting coding regions. 
Pindel was run for each sample independently, and compared to the gene 
annotation. Gene deletions were called as deletions covering 80% of a locus. 
Overlapping variants between individuals and species were identified with the 
bedtools intersect command. Gene deletions and inversions required 80% overlap 
to be called as orthologous, shorter indels required complete overlap. 

Effect Prediction of Polymrphisms 

The software package SnpEff v3.5 was used to predict the genomic effects of 
SNPs and structural variants, given the Capsella rubella genome annotation (33). 
Mutations more likely to cause loss-of-function effects were identified by using the 
option "-lof" and parsing for mutations flagged for "HIGH" effect. This set included 
polymorphisms knocking out splice sites, start or stop codons, or causing the gain of 
stop codons as well frameshift deletions and short-insertions. Possible 
compensatory mutations for all of these putatively deleterious mutations were 
accounted for within 50bp of each mutation. For instance, frameshift mutations 
within the same gene and within 50bp of each other were excluded from the 
analysis. Polymorphisms fixed between C. bursa-pastoris homeologs (segregating at 
exactly 50% frequency) were included as a separate category, in order to test for the 
fixation of deleterious mutations within homeologs. Since all mutations were called 
relative to the C. rubella assembly, all mutations were polarized by using one Neslia 



paniculata individual as an outgroup. Any putatively deleterious mutation also 
found in N. paniculata was excluded from the analysis. 

Functional Category Enrichment 

Gene ontologies (GO), the classification of genes into classes of molecular 
function, cellular components and biological process, were inferred from 
Arabidopsis thaliana, using the Virtual Plant online server (version 1.3, (34)). Since 
Capsella is closely related to Arabidopsis, the majority of orthologous genes in A 
thaliana are likely to have the same function in the Capsella species. A total of 
19,520 genes in Arabidopsis that have known orthologs in C. rubella were included 
in this analysis. 

Genes containing putatively deleterious SNPs, according to the SnpEff 
algorithm, in C. bursa-pastoris were analyzed with Virtual Plant's BioMaps tools 
(34). Single nucleotide polymorphisms fixed on a single homeolog ("fixed 
heterozygotes") were considered separately. Genes associated with GO categories 
that were found to be enriched among singletons retained following the ancient 
WGD event in Arabidopsis (35) were acquired from the Virtual Plant database. Only 
GO categories associated with less than one thousand genes were included for 
downstream analyses. Categories including organelle (6778 genes), cell part (14508 
genes) and cellular metabolic process (5998 genes) are examples of excluded groups. 
These GO terms are parents of the smaller GO classes included, so their removal 
likely reduced noise without compromising statistical power. 



Scanning for Selection 

Polymorphisms at 4-fold synonymous sites were counted in 1000, 10 000 
and 50 000 base-pair windows in coding regions upstream and downstream of 
selected putatively deleterious mutations in C. bursa-pastoris, and windows were 
averaged over mutations of the same category. The numbers of these neutral 
mutations was normalized by divergence to N. paniculate/. Normalizing by 
divergence controls for differences in mutation rate across the genome. The SNPs 
fixed on a single homeolog were considered separately. Singleton genes whose 
orthologs were retained following the ancient genome duplication in Arabidopsis 
were scanned (35], as were related genes from the same GO categories. Finally, 
nuclear genes associated with the chloroplast were considered separately, since 
they were significantly enriched for loss-of-function mutations in past analyses. 
VCFtools vO. 1.12a was also used to assess nucleotide diversity over all site types in 
the same size windows surrounding fixed putatively deleterious mutations using the 
"--window-pi" option. Neutral expectations of diversity surrounding a given fixation 
were generated by analyzing diversity in 50kb windows surrounding 4-fold 
synonymous fixations. 95% CI for each window were computed through 
bootstrapping by substitution (n=1000). The result show in Figure S5 is robust to all 
window sizes, although noisier at smaller window sizes. 
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0.535 n 
611895322 n 
534014472 yi 
544530334 ri 
501982345 yi 
D.58049715 n 
624411304 ri 
484403823 n 
439367651 n 
560657301 n 
568239573 n 
369951122 n 
420599499 n- 
424878708 n 
424072202 n 
340828282 n 
596665061 n 
406962248 n 
519343505 n 
583756029 n 
537525214 n 



2859619 
6698512 
10493204 
10509217 



0.62 n 
0.642347969 n 

0.5332662 n 
0.612 n 
0.357565402 yi 
0.591900973 yi 
0.481028156 n 
0.525540958 yi 
0.543298038 n 
0.441274937 n 
0.562011638 n 
0.331042412 n 
0.531285868 n 
0.561780791 n 
0.494432596 yi 
0.602738501 yi 
0.561189346 n 
0.599648729 n 
0.533394111 a 
0.543219773 n 
0.504590851 n 
0.307653091 n- 
0.419877871 ti 
0.546735178 n 
0.61505969 n- 
0.583445 yi 

0.6165152 yi 



ATlg71320 
AT1G72110 
AT1G72110 
AT5G44420 

AT3G 18910 
AT2G13295 
HELITRON 



AT5G04047.1 



AT2G22145.1 

AT2G23510 

AT2G26320.1 

AT1G72950.1 

AT2G36700.1 

AT3G27900.1 



AT5G 11660.1 
AT5G06500 
AT5G 14345.1 
AT4G 11590.1 
AT5G24210.1 
AT4G 084 70.1 
HELITRON 
AT2G46080.1 
AT4G 10270.1 

AT3G55680.1 



AT4G35940.2 

AT5G32619.1 

AT5G35330 

AT1G30790 



AT5G60250.1 




Table S2 - Enrichment tests of putative inactivating mutations 



Species 


Mutation 
set 


GO 
class 


ID 


Term 


Enrichment 


Background 


FDR 


f Hi /roa_ 

pastoris 


o iuy ouuui i 

gained 


RP 




nM A mota ho I io 

process 


UU/ I OJU 


984/1 7 C >73 




f* 1 Hi /rca_ 

UUI OCT 

pastoris 


oiup OUUUI 1 

gained 


RP 

LJ I 


OU.UUUUO 1 u 


LJ 1 NAA 

recombination 


1 8/1 8SR 

I O/ I OJU 




n ni 

U.U I 


C. bursa- 
pastoris 


Stop codon 
gained 


BP 


GO:0006974 


Response to 
DNA damage 

cti mi 1 1 1 1 q 

o LI 1 1 1 U 1 u o 


36/1856 


163/17573 


0.02 


C. bursa- 

UcJ-jLUI /o 


Stop codon 

ycti i icu 


BP 


GO:0022402 


Cell cycle 

Ul UUCoo 


26/1856 


103/17573 


0.03 


C. bursa- 
pastoris 


Stop codon 
gained 


BP 


GO:0006508 


Proteolysis 


80/1856 


489/17573 


0.03 


Hi i fQP)- 
w. UUI oa 

pastoris 


oiuu UUUUI 1 

gained 


up 

IVI t 


f^onnrn894 

uv .UUU JO^H 


fata 
wdldl y LIU 

activity 


81 7/1 RRR 


R978/1 784"} 


9F-DQ 

C- L- U£7 


C* Hnr<m- 

w . UUI oa 

pastoris 


OjJHUC OllC 

lost 


RP 

LJ 1 


\J V_/ . U UU U £- \J 


LJ \ \ r\ IIICLdUUIIU 

process 


44/140? 

tt/ I tu^ 


984/17573 


U . UJ 


C. bursa- 
pastoris 


Splice site 
lost 


MF 


GO:0003824 


Catalytic 
activity 


585/1429 


6278/1 7843 


0.001 


C. grand- 

if In rzi 
it lui a 


Stop codon 

ycii i icu 


BP 


GO:0006259 


DNA metabolic 

nrnrocc 

Ul UOCoo 


33/837 


284/17573 


0.001 


C. grand- 
iflora 


Stop codon 
gained 


BP 


GO:0006281 


DNA repair 


19/837 


152/17573 


0.03 


C. grand- 
iflora 


Stop codon 
gained 


BP 


GO:0006974 


Response to 
DNA damage 
stimulus 


20/837 


163/17573 


0.03 


C. grand- 
iflora 


Stop codon 
gained 


MF 


GO:0003824 


Catalytic 
activity 


391/860 


6278/1 7843 


5E-07 


C. grand- 
iflora 


Splice site 
lost 


MF 


GO:0003824 


Catalytic 
activity 


323/749 


6278/1 7843 


0.003 



Table S3 



Geographical origin and collector information for all plant material used in this study. 



Species 


Sample 
designation 


Locality 


Country 


Collector 


d hursa- 


70.5 


Artemida 


Greece 

V J 1 V V V w 


Kate St Onse 


pastovis 












5.16 


Valladolid 


Spain 


Santiago Gonzalez- 










Martin 67 

1 VI til LIIIV/j 




13.16 


Krakow 


Poland 


Sandra Sherwood 




39-12-28 


Bacia 


Italy 


Kate St Onee 




12.4 


Halle 


Germany 


Walter Durka 




16.9 


Nijmegen 


Netherlands 


Koen Verhoeven 




VLA 


Vladivostok 


Russia 


Martin Lascoux 




PL 


Puli, Taiwan 


China 


Y-W Yang 




SE14 


Harnosand 


Sweden 


Svante Holm 




RK32 


Reykjavik 


Iceland 


John Paul Foxe 


C. orientalis 


1719-3 


Bayan-Olgiy Aymag 


Mongolia 


Barbara Neuffer, Herbert 










Hurka 




1719-4 


Bavan-Olsiv Avmas 


Monsolia 


Barbara Neuffer Herbert 

1 — J til VJ tl 1 K\ 1 1 VU11V1 * 1 I vl l/vl L 










Hurka 




1979-1 


Siberia Altai Krai 

UlUvl 1U* J. l-lltll l^_ltl| 


Russia 

1 V Vi i.' ..lit! 


D.A. German, N. Friesen 




1979-7 


Siberia, Altai Kraj 


Russia 


D.A. German, N. Friesen 




1981-6 


Siberia, Altai Kraj 


Russia 


D.A. German, N. Friesen 




1981-10 


Siberia, Altai Kraj 


Russia 


D.A. German, N. Friesen 




1985-1 


Siberia, Altai Kraj 


Russia 


D.A. German 




2008-1 


Xinjiang, Dzungaria 


China 


D.A. German et al. 




2008-7 


Xinjiang, Dzungaria 


China 


D.A. German et al. 




2008-9 


Xinjiang, Dzungaria 


China 


D.A. German et al. 



Table S4-Genbank accession numbers or PopSet numbers for publicly available 
sequence data used in this study. Designations are as follows: Cbp A=C. bursa- 
pastoris A homeologs, Cbp B=C. bursa-pastoris B homeologs, Cr=C. rubella, Cg=C. 
grandiflora, and Np=7V. paniculata. 



Locus 


Species/homeolog 


Popset no./GenBank acc. 
no. 


Reference 


Atlg77120 


Cbp A 


160334389 


(Slotte et al. 
2008) 


Atlg77120 


CbpB 


160334601 


(Slotte et al. 
2008) 


Atlg77120 


Cr 


160334571 


(Slotte et al. 
2008) 


Atlg77120 


Cg 


341865754 


(St Onge et al. 
2011) 


At5gl0140 


Cbp A 


160335879 


(Slotte et al. 
2008) 


At5gl0140 


CbpB 


160336063 


(Slotte et al. 
2008) 


At5gl0140 


Cr 


160335193 


(Slotte et al. 
2008) 


At5gl0140 


Cg 


341866744 


(St Onge et al. 
2011) 


At4g02560 


Cbp A 


160336247 


(Slotte et al. 
2008) 


At4g02560 


CbpB 


160336429 


(Slotte et al. 
2008) 


At4g02560 


Cr 


160335645 


(Slotte et al. 
2008) 


At4g02560 


Cg 


341865968 


(St Onge et al. 
2011) 


At4g02560 


Np 


DQ343348.1 


(Slotte et al. 
2006) 


At4g00650 


Cbp A 


160335277 


(Slotte et al. 
2008) 


At4g00650 


CbpB 


160335461 


(Slotte et al. 
2008) 


At4g00650 


Cr 


160335235 


(Slotte et al. 
2008) 


At4g00650 


Cg 


FJ650267.1.FJ650266.1, 
FJ650265.1,FJ650264.1, 
FJ650263.1.FJ650262.1 


(Guo et al. 2009) 


Atlg03560 


Cbp A 


260780309 


(Slotte et al. 
2009) 


Atlg03560 


CbpB 


260765968 


(Slotte et al. 
2009) 


Atlg03560 


Cr 


341606816 


(St Onge et al. 
2011) 


Atlg03560 


Cg 


341605684 


(St Onge et al. 
2011) 



Atlgl5240 


Cg 


JX065232.1-JX065247.1 


(St Onge et al. 
2012) 




Cr 


JQ418746.1-JQ418751.1 


(St Onge et al. 
2012) 




Cbp 


JQ4 1 8695. 1-JQ4 18745.1 


(St Onge et al. 
2012) 


At2g26730 


Cg 


FJ182827.1-FJ182845.1 


(Foxe et al. 2009) 




Cr 


FJ182814.1-FJ182826.1 


(Foxe et al. 2009) 




Cbp 


JQ4 1 8752. 1 -JQ4 1 880 1 . 1 


(St Onge et al. 
2012) 


At4g08920 


Cg 


341606524 


(St Onge et al. 
2011) 




Cr 


160334783; 341607428 


(Slotte et al. 2009; 
St Onge et al. 
2011) 




CbpA 


160334825; 260765934 


(Slotte et al. 2008; 
Slotte et al. 2009) 




CbpB 


160335009; 261047491 


(Slotte et al. 2008; 
Slotte et al. 2009) 


At5g51670 


Cg 


FJ183276.1-FJ183293.1 


(Foxe et al. 2009) 




Cr 


FJ183262.1-FJ183275.1; 
JQ418910.1-JQ418914.1 


(Foxe et al. 2009; 
St Onge et al. 
2011) 




Cbp 


JQ418859.1-JQ418909.1 


(St Onge et al. 
2012) 



Table S5. Demographic parameter estimates with 95% confidence intervals for four models of allopolyploid speciation in Capsella. Estimates of 
effective population sizes for C. grandiflora (Cg), C. orientalis (Co), C. bursa-pastoris (Cbp A and Cbp B) are given in thousands of individuals, 
and estimates of the timing of the origin of C. bursa-pastoris (Tl(Cbp)) and the split between C. grandiflora and C. orientalis (T2(Cg-Co)) are 
given in kya. Note that for models with exponential population size change, effective population sizes correspond to current effective population 
sizes. 



Model 


N.Cg 


N e Co 


N e CbpA 


N e CbpB 


TJ(Cbp) 


T2(Cg-Co) 


2Nm(Cg-CbpA) 


rCg 


rCo 


rCbp 


Stepwise change 


529 


54 


52 


104 


184 


736 












(184-692) 


(17-67) 


(18-71) 


(42-131) 


(63-251) 


(237-981) 










Stepwise change with 


531 


57 


6 


102 


197 


784 


0.14 








migration 


(230-552) 


(24-60) 


(2-6) 


(45-109) 


(80-211) 


(335-847) 


(0.04-0.52) 








Exponential change 


840 


4 


37 


75 


128 


931 




-4.1 MO" 6 


2.6*10" s 


4.8* 10" 7 




(148-868) 


(1-6) 


(6-55) 


(12-101) 


(22-177) 


(161-1159) 




(-3.5* 10" 5 - 


(1.6*10" 5 - 


(-4.0* 10" 5 - 


















■1.3*10"*) 


1.4*10"") 


9.4* 10" 6 ) 


Exponential change 


946 


7 


5 


56 


164 


914 


0.12 


-4.4*10-'' 


1.6*10" s 


1.0* 10" 5 


with migration 


(383-1077) 


(4-14) 


(3-6) 


(53-130) 


(100-269) 


(448-1188) 


(0.06-0.34) 


(-6.2*10"'- 


(4.3*10""- 


(-7.3*10""- 


















-1.3*10") 


2.3* 10" 5 ) 


6.8*10-") 



